Overview

Dataset statistics

Number of variables20
Number of observations110148
Missing cells36827
Missing cells (%)1.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory51.0 MiB
Average record size in memory485.9 B

Variable types

NUM8
CAT7
BOOL5

Reproduction

Analysis started2020-07-30 06:34:38.459667
Analysis finished2020-07-30 06:38:15.758096
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
app_date has a high cardinality: 120 distinct values High cardinality
default has 36349 (33.0%) missing values Missing
decline_app_cnt has 91471 (83.0%) zeros Zeros
bki_request_cnt has 28908 (26.2%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

Distinct count73799
Unique (%)67.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30719.7228
Minimum0
Maximum73798
Zeros2
Zeros (%)< 0.1%
Memory size860.7 KiB

Quantile statistics

Minimum0
5-th percentile2753.35
Q113768
median27536.5
Q346261.25
95-th percentile68290.65
Maximum73798
Range73798
Interquartile range (IQR)32493.25

Descriptive statistics

Standard deviation20443.7233
Coefficient of variation (CV)0.6654917896
Kurtosis-0.8889186034
Mean30719.7228
Median Absolute Deviation (MAD)17135.70993
Skewness0.4419381927
Sum3383716027
Variance417945822.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 36348.5 73798. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 2 < 0.1%
 
17556 2 < 0.1%
 
3229 2 < 0.1%
 
1180 2 < 0.1%
 
15515 2 < 0.1%
 
13466 2 < 0.1%
 
11417 2 < 0.1%
 
9368 2 < 0.1%
 
23703 2 < 0.1%
 
21654 2 < 0.1%
 
Other values (73789) 110128 > 99.9%
 
ValueCountFrequency (%) 
0 2 < 0.1%
 
1 2 < 0.1%
 
2 2 < 0.1%
 
3 2 < 0.1%
 
4 2 < 0.1%
 
ValueCountFrequency (%) 
73798 1 < 0.1%
 
73797 1 < 0.1%
 
73796 1 < 0.1%
 
73795 1 < 0.1%
 
73794 1 < 0.1%
 

client_id
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count110148
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean55074.5
Minimum1
Maximum110148
Zeros0
Zeros (%)0.0%
Memory size860.7 KiB

Quantile statistics

Minimum1
5-th percentile5508.35
Q127537.75
median55074.5
Q382611.25
95-th percentile104640.65
Maximum110148
Range110147
Interquartile range (IQR)55073.5

Descriptive statistics

Standard deviation31797.13306
Coefficient of variation (CV)0.5773476484
Kurtosis-1.2
Mean55074.5
Median Absolute Deviation (MAD)27537
Skewness0
Sum6066346026
Variance1011057671
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000e+00 1.10148e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
97541 1 < 0.1%
 
93447 1 < 0.1%
 
70920 1 < 0.1%
 
72969 1 < 0.1%
 
66826 1 < 0.1%
 
68875 1 < 0.1%
 
79116 1 < 0.1%
 
81165 1 < 0.1%
 
75022 1 < 0.1%
 
Other values (110138) 110138 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
ValueCountFrequency (%) 
110148 1 < 0.1%
 
110147 1 < 0.1%
 
110146 1 < 0.1%
 
110145 1 < 0.1%
 
110144 1 < 0.1%
 

app_date
Categorical

HIGH CARDINALITY
Distinct count120
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
18MAR2014
 
1491
19MAR2014
 
1363
17MAR2014
 
1350
31MAR2014
 
1317
07APR2014
 
1296
Other values (115)
103331
ValueCountFrequency (%) 
18MAR2014 1491 1.4%
 
19MAR2014 1363 1.2%
 
17MAR2014 1350 1.2%
 
31MAR2014 1317 1.2%
 
07APR2014 1296 1.2%
 
02APR2014 1291 1.2%
 
11MAR2014 1245 1.1%
 
04MAR2014 1242 1.1%
 
01APR2014 1239 1.1%
 
11FEB2014 1233 1.1%
 
Other values (110) 97081 88.1%
 

Length

Max length9
Mean length9
Min length9
ValueCountFrequency (%) 
Decimal_Number 10 52.6%
 
Uppercase_Letter 9 47.4%
 
ValueCountFrequency (%) 
Common 10 52.6%
 
Latin 9 47.4%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

education
Categorical

Distinct count5
Unique (%)< 0.1%
Missing478
Missing (%)0.4%
Memory size860.7 KiB
SCH
57998
GRD
34768
UGR
14748
PGR
 
1865
ACD
 
291
ValueCountFrequency (%) 
SCH 57998 52.7%
 
GRD 34768 31.6%
 
UGR 14748 13.4%
 
PGR 1865 1.7%
 
ACD 291 0.3%
 
(Missing) 478 0.4%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 9 81.8%
 
Lowercase_Letter 2 18.2%
 
ValueCountFrequency (%) 
Latin 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

sex
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
F
61836
M
48312
ValueCountFrequency (%) 
F 61836 56.1%
 
M 48312 43.9%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 2 100.0%
 
ValueCountFrequency (%) 
Latin 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

age
Real number (ℝ≥0)

Distinct count52
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.24940988
Minimum21
Maximum72
Zeros0
Zeros (%)0.0%
Memory size860.7 KiB

Quantile statistics

Minimum21
5-th percentile24
Q130
median37
Q348
95-th percentile60
Maximum72
Range51
Interquartile range (IQR)18

Descriptive statistics

Standard deviation11.51806263
Coefficient of variation (CV)0.2934582371
Kurtosis-0.7260121183
Mean39.24940988
Median Absolute Deviation (MAD)9.695175763
Skewness0.4802480831
Sum4323244
Variance132.6657668
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 23.5 24.5 ... 67.5 68.5 69.5 70.5 72. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
31 4084 3.7%
 
28 4035 3.7%
 
30 4035 3.7%
 
27 3964 3.6%
 
29 3940 3.6%
 
26 3780 3.4%
 
32 3773 3.4%
 
34 3548 3.2%
 
33 3499 3.2%
 
35 3386 3.1%
 
Other values (42) 72104 65.5%
 
ValueCountFrequency (%) 
21 1262 1.1%
 
22 1415 1.3%
 
23 2295 2.1%
 
24 2780 2.5%
 
25 3292 3.0%
 
ValueCountFrequency (%) 
72 2 < 0.1%
 
71 6 < 0.1%
 
70 60 0.1%
 
69 110 0.1%
 
68 261 0.2%
 

car
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
N
74290
Y
35858
ValueCountFrequency (%) 
N 74290 67.4%
 
Y 35858 32.6%
 

car_type
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
N
89140
Y
21008
ValueCountFrequency (%) 
N 89140 80.9%
 
Y 21008 19.1%
 

decline_app_cnt
Real number (ℝ≥0)

ZEROS
Distinct count24
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2732051422
Minimum0
Maximum33
Zeros91471
Zeros (%)83.0%
Memory size860.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum33
Range33
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.799099319
Coefficient of variation (CV)2.924905851
Kurtosis101.2380998
Mean0.2732051422
Median Absolute Deviation (MAD)0.4537594429
Skewness6.493006696
Sum30093
Variance0.6385597216
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 7.5 9.5 11.5 14.5 33. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 91471 83.0%
 
1 12500 11.3%
 
2 3622 3.3%
 
3 1365 1.2%
 
4 606 0.6%
 
5 255 0.2%
 
6 156 0.1%
 
7 58 0.1%
 
8 37 < 0.1%
 
9 29 < 0.1%
 
Other values (14) 49 < 0.1%
 
ValueCountFrequency (%) 
0 91471 83.0%
 
1 12500 11.3%
 
2 3622 3.3%
 
3 1365 1.2%
 
4 606 0.6%
 
ValueCountFrequency (%) 
33 1 < 0.1%
 
30 1 < 0.1%
 
24 1 < 0.1%
 
22 1 < 0.1%
 
21 1 < 0.1%
 

good_work
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
0
91917
1
 
18231
ValueCountFrequency (%) 
0 91917 83.4%
 
1 18231 16.6%
 

score_bki
Real number (ℝ)

Distinct count102618
Unique (%)93.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-1.904535049
Minimum-3.62458632
Maximum0.19977285
Zeros0
Zeros (%)0.0%
Memory size860.7 KiB

Quantile statistics

Minimum-3.62458632
5-th percentile-2.696247185
Q1-2.26043367
median-1.92082293
Q3-1.567888152
95-th percentile-1.055049083
Maximum0.19977285
Range3.82435917
Interquartile range (IQR)0.6925455175

Descriptive statistics

Standard deviation0.4993974924
Coefficient of variation (CV)-0.2622149131
Kurtosis-0.1492918934
Mean-1.904535049
Median Absolute Deviation (MAD)0.4026105377
Skewness0.1939872976
Sum-209780.7266
Variance0.2493978554
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-3.62458632 -3.40235635 -3.21728287 -3.21663488 -3.15990211 ... -0.6388378 -0.52401387 -0.37380806 -0.15264482 0.19977285], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-1.77526279 517 0.5%
 
-2.1042109 454 0.4%
 
-2.22500363 424 0.4%
 
-2.16966378 375 0.3%
 
-2.02410005 278 0.3%
 
-1.92082293 270 0.2%
 
-2.38726804 238 0.2%
 
-1.52642194 207 0.2%
 
-2.44723899 207 0.2%
 
-2.2729409 176 0.2%
 
Other values (102608) 107002 97.1%
 
ValueCountFrequency (%) 
-3.62458632 1 < 0.1%
 
-3.59798083 1 < 0.1%
 
-3.58258691 1 < 0.1%
 
-3.57419708 1 < 0.1%
 
-3.56422406 1 < 0.1%
 
ValueCountFrequency (%) 
0.19977285 2 < 0.1%
 
0.1980699 1 < 0.1%
 
0.18882044 1 < 0.1%
 
0.18361297 1 < 0.1%
 
0.16854933 1 < 0.1%
 

bki_request_cnt
Real number (ℝ≥0)

ZEROS
Distinct count40
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.00500236
Minimum0
Maximum53
Zeros28908
Zeros (%)26.2%
Memory size860.7 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum53
Range53
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.266925867
Coefficient of variation (CV)1.130635012
Kurtosis23.16785082
Mean2.00500236
Median Absolute Deviation (MAD)1.552358663
Skewness3.082728152
Sum220847
Variance5.138952887
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 16.5 19.5 24.5 33.5 53. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 28908 26.2%
 
1 27295 24.8%
 
2 20481 18.6%
 
3 13670 12.4%
 
4 8406 7.6%
 
5 4960 4.5%
 
6 2500 2.3%
 
7 1292 1.2%
 
8 735 0.7%
 
9 459 0.4%
 
Other values (30) 1442 1.3%
 
ValueCountFrequency (%) 
0 28908 26.2%
 
1 27295 24.8%
 
2 20481 18.6%
 
3 13670 12.4%
 
4 8406 7.6%
 
ValueCountFrequency (%) 
53 1 < 0.1%
 
47 1 < 0.1%
 
46 1 < 0.1%
 
45 1 < 0.1%
 
41 1 < 0.1%
 

region_rating
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56.75118931
Minimum20
Maximum80
Zeros0
Zeros (%)0.0%
Memory size860.7 KiB

Quantile statistics

Minimum20
5-th percentile40
Q150
median50
Q360
95-th percentile80
Maximum80
Range60
Interquartile range (IQR)10

Descriptive statistics

Standard deviation13.06592289
Coefficient of variation (CV)0.2302317017
Kurtosis-0.6334345368
Mean56.75118931
Median Absolute Deviation (MAD)10.90200861
Skewness0.4778692262
Sum6251030
Variance170.7183409
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[20. 35. 45. 55. 65. 75. 80.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50 40981 37.2%
 
60 23999 21.8%
 
40 17947 16.3%
 
80 17170 15.6%
 
70 9304 8.4%
 
30 434 0.4%
 
20 313 0.3%
 
ValueCountFrequency (%) 
20 313 0.3%
 
30 434 0.4%
 
40 17947 16.3%
 
50 40981 37.2%
 
60 23999 21.8%
 
ValueCountFrequency (%) 
80 17170 15.6%
 
70 9304 8.4%
 
60 23999 21.8%
 
50 40981 37.2%
 
40 17947 16.3%
 

home_address
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
2
59591
1
48688
3
 
1869
ValueCountFrequency (%) 
2 59591 54.1%
 
1 48688 44.2%
 
3 1869 1.7%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

work_address
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
3
67113
2
30761
1
 
12274
ValueCountFrequency (%) 
3 67113 60.9%
 
2 30761 27.9%
 
1 12274 11.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 3 100.0%
 
ValueCountFrequency (%) 
Common 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

income
Real number (ℝ≥0)

Distinct count1207
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41012.64854
Minimum1000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size860.7 KiB

Quantile statistics

Minimum1000
5-th percentile10000
Q120000
median30000
Q348000
95-th percentile100000
Maximum1000000
Range999000
Interquartile range (IQR)28000

Descriptive statistics

Standard deviation45399.73505
Coefficient of variation (CV)1.10696911
Kurtosis100.1746159
Mean41012.64854
Median Absolute Deviation (MAD)23532.68019
Skewness7.503020095
Sum4517461211
Variance2061135943
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1000. 1050. 4990. 5050. 5150. ... 590000. 615000. 999499.5 999999.5 1000000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
30000 10437 9.5%
 
25000 9090 8.3%
 
20000 8174 7.4%
 
40000 7383 6.7%
 
50000 6742 6.1%
 
35000 6319 5.7%
 
15000 5874 5.3%
 
60000 3818 3.5%
 
45000 3670 3.3%
 
18000 2732 2.5%
 
Other values (1197) 45909 41.7%
 
ValueCountFrequency (%) 
1000 6 < 0.1%
 
1100 1 < 0.1%
 
1200 1 < 0.1%
 
1500 2 < 0.1%
 
1700 1 < 0.1%
 
ValueCountFrequency (%) 
1000000 13 < 0.1%
 
999999 4 < 0.1%
 
999000 2 < 0.1%
 
990000 1 < 0.1%
 
950000 4 < 0.1%
 

sna
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
1
70681
4
17481
2
15832
3
 
6154
ValueCountFrequency (%) 
1 70681 64.2%
 
4 17481 15.9%
 
2 15832 14.4%
 
3 6154 5.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

first_time
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
3
46588
4
28017
1
18296
2
17247
ValueCountFrequency (%) 
3 46588 42.3%
 
4 28017 25.4%
 
1 18296 16.6%
 
2 17247 15.7%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size860.7 KiB
N
93721
Y
 
16427
ValueCountFrequency (%) 
N 93721 85.1%
 
Y 16427 14.9%
 

default
Boolean

MISSING
Distinct count2
Unique (%)< 0.1%
Missing36349
Missing (%)33.0%
Memory size860.7 KiB
0
64427
1
 
9372
(Missing)
36349
ValueCountFrequency (%) 
0 64427 58.5%
 
1 9372 8.5%
 
(Missing) 36349 33.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexclient_idapp_dateeducationsexagecarcar_typedecline_app_cntgood_workscore_bkibki_request_cntregion_ratinghome_addresswork_addressincomesnafirst_timeforeign_passportdefault
002590501FEB2014SCHM62YY00-2.008753150121800041N0.0
116316112MAR2014SCHF59NN00-1.532276350231900041N0.0
222588701FEB2014SCHM25YN20-1.408142180123000014Y0.0
331622223JAN2014SCHF53NN00-2.057471250231000013N0.0
4410165518APR2014GRDM48NN01-1.244723160233000014Y0.0
554141518FEB2014SCHM27YN01-2.032257050111500023N0.0
662843604FEB2014SCHM39NN00-2.225004060122800011N0.0
776876917MAR2014SCHF39NN00-1.522739150234500033N0.0
883842414FEB2014SCHF50YN10-1.676061050113000014N0.0
99449610JAN2014UGRF54NN00-2.695176150232400013N0.0

Last rows

df_indexclient_idapp_dateeducationsexagecarcar_typedecline_app_cntgood_workscore_bkibki_request_cntregion_ratinghome_addresswork_addressincomesnafirst_timeforeign_passportdefault
110138363391607223JAN2014GRDF28NN00-1.651781460121300013NNaN
110139363401009017JAN2014SCHF53YN00-1.84505825012700011NNaN
110140363419043507APR2014UGRF48NN00-2.066300160112700014NNaN
110141363424250919FEB2014SCHF58YY01-1.857117150232500043NNaN
110142363437240520MAR2014SCHF40NN00-2.039905050232000041NNaN
110143363448377531MAR2014SCHF37NN10-1.744976350231500041NNaN
1101443634510625425APR2014GRDF64YY00-2.2937813601220000014NNaN
110145363468185230MAR2014GRDM31NN20-0.940752150126000042NNaN
11014636347197107JAN2014UGRF27NN10-1.242392280233000011NNaN
110147363486904417MAR2014SCHM38NN00-1.507549250121500042NNaN